9 research outputs found

    Visus: An Interactive System for Automatic Machine Learning Model Building and Curation

    Full text link
    While the demand for machine learning (ML) applications is booming, there is a scarcity of data scientists capable of building such models. Automatic machine learning (AutoML) approaches have been proposed that help with this problem by synthesizing end-to-end ML data processing pipelines. However, these follow a best-effort approach and a user in the loop is necessary to curate and refine the derived pipelines. Since domain experts often have little or no expertise in machine learning, easy-to-use interactive interfaces that guide them throughout the model building process are necessary. In this paper, we present Visus, a system designed to support the model building process and curation of ML data processing pipelines generated by AutoML systems. We describe the framework used to ground our design choices and a usage scenario enabled by Visus. Finally, we discuss the feedback received in user testing sessions with domain experts.Comment: Accepted for publication in the 2019 Workshop on Human-In-the-Loop Data Analytics (HILDA'19), co-located with SIGMOD 201

    Unwind: Interactive Fish Straightening

    Full text link
    The ScanAllFish project is a large-scale effort to scan all the world's 33,100 known species of fishes. It has already generated thousands of volumetric CT scans of fish species which are available on open access platforms such as the Open Science Framework. To achieve a scanning rate required for a project of this magnitude, many specimens are grouped together into a single tube and scanned all at once. The resulting data contain many fish which are often bent and twisted to fit into the scanner. Our system, Unwind, is a novel interactive visualization and processing tool which extracts, unbends, and untwists volumetric images of fish with minimal user interaction. Our approach enables scientists to interactively unwarp these volumes to remove the undesired torque and bending using a piecewise-linear skeleton extracted by averaging isosurfaces of a harmonic function connecting the head and tail of each fish. The result is a volumetric dataset of a individual, straight fish in a canonical pose defined by the marine biologist expert user. We have developed Unwind in collaboration with a team of marine biologists: Our system has been deployed in their labs, and is presently being used for dataset construction, biomechanical analysis, and the generation of figures for scientific publication

    AlphaD3M: Machine Learning Pipeline Synthesis

    Get PDF
    peer reviewedWe introduce AlphaD3M, an automatic machine learning (AutoML) system based on meta reinforcement learning using sequence models with self play. AlphaD3M is based on edit operations performed over machine learning pipeline primitives providing explainability. We compare AlphaD3M with state-of-the-art AutoML systems: Autosklearn, Autostacker, and TPOT, on OpenML datasets. AlphaD3M achieves competitive performance while being an order of magnitude faster, reducing computation time from hours to minutes, and is explainable by design

    Concentric RadViz: visual exploration of multi-task classification

    Get PDF
    The discovery of patterns in large data collections is a difficult task. Visualization and machine learning techniques have emerged as a way to facilitate data analysis, providing tools to uncover relevant patterns from the data. This paper presents Concentric RadViz, a general purpose class visualization system that takes into account multi-class, multi-label and multi-task classifiers. Concentric RadViz uses a force attenuation scheme, which minimizes cluttering and ambiguity in the visual layout. In addition, the user can add concentric circles to the layout in order to represent classification tasks. Our validation results and the application of Concentric RadViz for two real collections suggest that this tool can reveal important data patterns and relations. In our application, the user can interact with the visualization by selecting regions of interest according to specific criteria and changing projection parameters.FAPESP (#2011/22749- 8, #2012/17961-0, #2012/24801-0, #2014/09546-9, #2014/18665-1)CNPq (#132239/2013-2, #305796/2013- 5, #302643/2013-3

    Visualization of similarities in song data sets

    No full text
    Coleções de músicas estão amplamente disponíveis na internet e, graças ao crescimento na capacidade de armazenamento e velocidade de transmissão de dados, usuários podem ter acesso a uma quantidade quase ilimitada de composições. Isso levou a uma maior necessidade de organizar, recuperar e processar dados musicais de modo automático. Visualização de informação é uma área de pesquisa que possibilita a análise visual de grandes conjuntos de dados e, por isso, é uma ferramenta muito valiosa para a exploração de bibliotecas musicais. Nesta dissertação, metodologias para a construção de duas técnicas de visualização de bases de dados de música são propostas. A primeira, Grafo de Similaridades, permite a exploração da base de dados em termos de similaridades hierárquicas. A segunda, RadViz Concêntrico, representa os dados em termos de tarefas de classificação e permite que o usuário altere a visualização de acordo com seus interesses. Ambas as técnicas são capazes de revelar estruturas de interesse no conjunto de dados, facilitando o seu entendimento e exploração.Music collections are widely available on the internet and, leveraged by the increasing storage and bandwidth capability, users can currently access a multitude of songs. This leads to a growing demand towards automated methods for organizing, retrieving and processing music data. Information visualization is a research area that allows the analysis of large data sets, thus, it is a valuable tool for the exploration of music libraries. In this thesis, methodologies for the development of two music visualization techniques are proposed. The first, Similarity Graph, enables the exploration of data sets in terms of hierarchical similarities. The second, Concentric RadViz, represents the data in terms of classification tasks and enables the user to alter the visualization according to his interests. Both techniques are able to reveal interesting structures in the data, favoring its understanding and exploration

    Um método para reconstrução de superfícies baseado em nuvem de pontos visando representações em multirresolução

    No full text
    The representation of real objects in virtual environments has applications in many areas, such as cartography, mixed reality and reverse engineering. The generation of these objects can be performed in two ways: manually, with CAD (Computer Aided Design) tools, or automatically, by means of surface reconstruction techniques. The simpler the 3D model, the easier it is to process and store it. Multiresolution reconstruction methods can generate polygonal meshes in different levels of detail and, to improve the response time of a computer program, distant objects can be represented with few details, while more detailed models are used in closer objects. This work presents a new approach to multiresolution surface reconstruction, particularly interesting to noisy and low definition data, for example, point clouds captured with Kinect sensorA representação de objetos reais em ambientes virtuais tem aplicações em diversas áreas, como cartografia, realidade misturada e engenharia reversa. A geração de tais objetos pode ser realizada de duas maneiras: manualmente, com ferramentas CAD ( Computer Aided Design ), ou de forma automática, por meio de técnicas de reconstrução de superfícies. Quanto mais simples o modelo 3D, mais fácil é processá-lo e armazená-lo. Métodos de reconstrução em multirresolução são capazes de gerar malhas poligonais em diferentes níveis de detalhe e, para melhorar o tempo de resposta de uma aplicação, objetos distantes do observador podem ser representados de forma mais simples, enquanto modelos mais detalhados são utilizados apenas em objetos próximos. Este trabalho apresenta uma nova abordagem para a reconstrução de superfícies em multirresolução, particularmente interessante para dados ruidosos e de baixa definição, por exemplo, nuvens de pontos provenientes do sensor Microsoft Kinec

    StatCast Dashboard: Exploration of Spatiotemporal Baseball Data

    No full text
    corecore